59 research outputs found

    Forecasting of commercial sales with large scale Gaussian Processes

    Full text link
    This paper argues that there has not been enough discussion in the field of applications of Gaussian Process for the fast moving consumer goods industry. Yet, this technique can be important as it e.g., can provide automatic feature relevance determination and the posterior mean can unlock insights on the data. Significant challenges are the large size and high dimensionality of commercial data at a point of sale. The study reviews approaches in the Gaussian Processes modeling for large data sets, evaluates their performance on commercial sales and shows value of this type of models as a decision-making tool for management.Comment: 1o pages, 5 figure

    Comparison of results from ExomeCNV and SAAS-CNV on NA18507 WES data.

    No full text
    <p>(A) Correlation of total read depths between each synthesized tumor-normal pair for ExomeCNV versus SAAS-CNV. (B) The number of false positives for CNV and CN-LOH. (C) The size of false CNV and CN-LOH calls. (D) The number of false CNV calls versus the correlation. In (A) and (D), each dot represents a synthesized pair. In (A), the dashed line indicates the line with slope 1. In (D), the black line indicates the fitted linear regression line for ExomeCNV and gray line for SAAS-CNV. In (B) and (C), p-values are based on paired t-test.</p

    NA18507 WGS analysis results from SAAS-CNV, Control-FREEC and CNAnorm.

    No full text
    <p>The number (A) and the size (B) of falsely called alterations are compared between the three methods stratified by CNVs and CN-LOHs. CNAnorm only detects copy number gain and loss but not CN-LOHs. In (A), the p-values are based on paired Wilcoxon signed-rank test and have the same value due to the nature of rank-based test. In (B), the p-value is from two-sample t-test on the sets of falsely called CNVs (or CN-LOHs) pooled across the 6 pairs for each method respectively.</p

    Comparison of results from six analyses on Dataset II.

    No full text
    <p>The results from GAP analysis on SNP array data are treated as benchmark. (A) ROC curves for different analyses coded in different lines. The overlap rate threshold is 50%. (B) The concordence of SCNA calls (black) and the percentage of consensus segments (blue) for the six analyses (coded the same way as in (A)), as the overlap rate threshold varies.</p

    SAAS-CNV: A Joint Segmentation Approach on Aggregated and Allele Specific Signals for the Identification of Somatic Copy Number Alterations with Next-Generation Sequencing Data

    No full text
    <div><p>Cancer genomes exhibit profound somatic copy number alterations (SCNAs). Studying tumor SCNAs using massively parallel sequencing provides unprecedented resolution and meanwhile gives rise to new challenges in data analysis, complicated by tumor aneuploidy and heterogeneity as well as normal cell contamination. While the majority of read depth based methods utilize total sequencing depth alone for SCNA inference, the allele specific signals are undervalued. We proposed a joint segmentation and inference approach using both signals to meet some of the challenges. Our method consists of four major steps: 1) extracting read depth supporting reference and alternative alleles at each SNP/Indel locus and comparing the total read depth and alternative allele proportion between tumor and matched normal sample; 2) performing joint segmentation on the two signal dimensions; 3) correcting the copy number baseline from which the SCNA state is determined; 4) calling SCNA state for each segment based on both signal dimensions. The method is applicable to whole exome/genome sequencing (WES/WGS) as well as SNP array data in a tumor-control study. We applied the method to a dataset containing no SCNAs to test the specificity, created by pairing sequencing replicates of a single HapMap sample as normal/tumor pairs, as well as a large-scale WGS dataset consisting of 88 liver tumors along with adjacent normal tissues. Compared with representative methods, our method demonstrated improved accuracy, scalability to large cancer studies, capability in handling both sequencing and SNP array data, and the potential to improve the estimation of tumor ploidy and purity.</p></div

    Comparison of SCNA signals between SNP array and WGS from Dataset II.

    No full text
    <p>(A) log2ratio; (B) log2mBAF. Each dot indicates the median signal value of a genomic segment with concordant SCNA status and >50% overlap between the results from the two platforms using SAAS-CNV. The dashed line indicates the line with slope 1. Solid line is local polynomial regression (LOESS) fit to data points.</p

    SCNA profile for the sample PT017 from Dataset II.

    No full text
    <p>(A) and (B) display SNP array data and (C) and (D) WGS data. In (A) and (C), on the top panel, the log2ratio signal is plotted against chromosomal position and on the bottom panel, the log2mBAF signal. The dots, each representing a locus, are colored alternately to distinguish chromosomes. The segments, each representing a DNA segment resulting from the joint segmentation, are colored based on inferred SCNA status. In (B) and (D), on the main log2mBAF-log2ratio panel, each circle corresponds to a segment in (A) and (C), with the size reflecting the length of the segment; the color code is specified in legend; the dashed gray lines indicate the adjusted baselines. The side panels, corresponding to log2ratio and log2mBAF dimension respectively, show the distribution of median values of each segment.</p

    SCNA profile for the sample PT116 from Dataset II.

    No full text
    <p>(A) and (B) display SNP array data and (C) and (D) WGS data. In (A) and (C), on the top panel, the log2ratio signal is plotted against chromosomal position and on the bottom panel, the log2mBAF signal. The dots, each representing a locus, are colored alternately to distinguish chromosomes. The segments, each representing a DNA segment resulting from the joint segmentation, are colored based on inferred SCNA status. In (B) and (D), on the main log2mBAF-log2ratio panel, each circle corresponds to a segment in (A) and (C), with the size reflecting the length of the segment; the color code is specified in legend; dashed gray lines indicate adjusted baselines. The side panels, corresponding to log2ratio and log2mBAF dimension respectively, show the distribution of median values of each segment.</p

    Summary of SCNA results for Dataset II.

    No full text
    <p>SAAS-CNV and ExomeCNV were applied on synthesized WES data, SAAS-CNV and GAP on SNP array data, and SAAS-CNV, CNAnorm and Control-FREEC on WGS data. (A) Number of loci on autosome per sample. (B) Number of segments per sample. (C) Number of loci involved in each segment. (D) The size (in bp) of each segment. In each sub-plot, y-axis is displayed on log10 scale.</p
    corecore